Method and system for manually maintaining item authority

ABSTRACT

An item authority system is provided. The item authority system uses rules to identify item definitions that match or potentially match an item description. When a unique match is found, then the item authority system may indicate that the item description describes the same item as the item definition. If multiple matches or only potential matches are identified, then the item authority system may allow a user to manually indicate which item definition matches.

This application is a continuation of U.S. application Ser. No.10/350,143, filed Jan. 22, 2003, now U.S. Pat. No. 8,671,120, which ishereby incorporated by reference in its entirety.

TECHNICAL FIELD

The described technology relates generally to matching item descriptionsto item definitions.

BACKGROUND

Some electronic commerce web sites allow many merchants to advertise andsell their products and services, referred to as items, through a singleweb site. The web site may maintain a product catalog that describes allthe products of all the merchants that are available to be purchasedthrough the web site. In addition, the web site may maintain a record ofthe inventory for each merchant. A customer who desires to purchase aproduct through the web site may browse or use a search engine to searchthe product catalog for a product of interest. When a product ofinterest is found, the web site may identify from an inventory table themerchants who have the product of interest in stock, their price for theproduct of interest, shipping terms, and so on. The customer can thenpurchase the product of interest from a merchant who, for example, isoffering the lowest price. The web site may coordinate the collecting ofpayment information and shipping information from the customer. The website then notifies the merchant, who ships the product according to theshipping information. The web site may collect the payment from afinancial institution, such as a credit card company. The web site maythen keep a commission on the sale of the product and pay the rest tothe merchant. The web site may also update the inventory for themerchant to reflect the sale of the product.

These web sites may provide various services that allow a merchant toupdate its inventory information maintained by the web site. Forexample, the web site may provide a bulk loader that uploads a file ofcurrent inventory

information from a merchant. The bulk loader may scan the inventoryinformation to ensure that it is in the correct format before updatingthe existing inventory information for that merchant. The inventoryinformation needs to be mapped to the product in the product catalogassociated with each product whose inventory information is beinguploaded. For example, if a product is a book, then the inventoryinformation may include the international standard book number (“ISBN”)for that book. The product catalog will contain an entry with that ISBNthat describes the book. If each product in inventory is mapped to aproduct in the product catalog, then, when a customer browses theproduct catalog, the web site can identify and display the correspondinginventory of that product.

Unfortunately, there may not be an industry standard way to uniquelyidentify each product that is in inventory. As a result, the uploadedinventory information may not correctly or uniquely be associated with aproduct in the product catalog. The uploaded inventory information may,nevertheless, include attributes of the product, such as title,publisher, manufacturer, part number, and price, that may help touniquely identify a corresponding product in the product catalog. Evenif there was an industry standard way to uniquely identify each product,the merchants might not use that unique identification in their owndatabases and thus have no easy way for providing that uniqueidentification when uploading their inventory information. In addition,since there may be hundreds of thousands of products in the productcatalog, even if each merchant attempted to use the uniqueidentification, it would be expected that various errors in theidentification itself would occur. For example, a data entry operatorfor a merchant may mistype the unique identification or simply enter thewrong identification. In addition, merchants may attempt to uploadinformation relating to new products that have not yet been defined inthe product catalog. It would be desirable to have a system that wouldautomatically match products in inventory to products in the productcatalog and, when not possible, to facilitate the manual matching ofproducts in inventory to products within the product catalog.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram illustrating the subsystems of the itemauthority system in one embodiment.

FIG. 2 is a block diagram illustrating components of the matching enginesubsystem in one embodiment.

FIG. 3 is a block diagram illustrating the components of the manualmatching subsystem in one embodiment.

FIG. 4 is a flow diagram illustrating the processing of the item matchercomponent in one embodiment.

FIG. 5 is a flow diagram illustrating the processing of the match toitem definitions component in one embodiment.

FIG. 6 is a flow diagram illustrating the processing of the identifycandidate item definitions component in one embodiment.

FIG. 7 is a flow diagram illustrating the processing of the identifyrule matches component in one embodiment.

FIG. 8 is a flow diagram illustrating the processing of the identifyfilter matches component in one embodiment.

FIG. 9 is a flow diagram illustrating the processing of the apply filtercomponent in one embodiment.

FIG. 10 is a display page illustrating the processing of unmatched itemdescriptions of the resolution queue in one embodiment.

FIG. 11 is a display page illustrating the process of searching for anitem definition using the match component.

FIG. 12 is a display page illustrating item definition search results inone embodiment.

FIG. 13 is a display page illustrating the comparison of itemdefinitions prior to merging.

FIG. 14 is a display page illustrating matching of unmatched itemdescriptions to an item definition in one embodiment.

FIG. 15 is a display page illustrating entry of a search specificationfor unmatched item descriptions in one embodiment.

FIG. 16 is a display page illustrating the results of a search forunmatched item descriptions in one embodiment.

FIG. 17 is a display page illustrating possible match item definitionsfor an unmatched item description.

DETAILED DESCRIPTION

A method and system for automatically identifying item definitions thatmatch item descriptions is provided. In one embodiment, the itemauthority system provides a matching engine that receives an itemdescription (e.g., corresponding to an item in inventory) and comparesthat item description to various item definitions (e.g., correspondingto products in the product catalog). The item definitions provide whatare considered authoritative definitions of each item. For example, anitem definition may include an industry standard unique identifier forthe item or may include values for attributes (e.g., title) thatuniquely identify the item. The matching engine identifies the itemdefinition that is most similar to the item description as the matchingitem definition. The item definition may improve over time as thedefinition is augmented by additional information or corrected with moreaccurate information.

The matching engine, in one embodiment, uses rules that specify how thesimilarity between an item definition and an item description is to bedetermined. These rules specify how to calculate a similarity score foran item definition and an item description based on a comparison oftheir attributes. A rule may also specify a threshold similarity scorethat is to be met in order for an item definition to match an itemdescription. For example, if the similarity score between an itemdefinition and an item description is 0.8 and the threshold similarityscore is 0.75, then the item definition and the item description may beconsidered a match. If the threshold similarity score is 0.9, however,then the item definition and the item description would not beconsidered a match. In one embodiment, a rule may specify multiple ways,referred to as “filters,” to calculate similarity scores. For example,if the item is a book, then one similarity score may be calculated basedon the similarity between the ISBNs in the item description and the itemdefinition, and another similarity score may be calculated based on thesimilarity between title and author attributes of the item descriptionand item definition. In one embodiment, when a similarity score iscalculated based on multiple attributes, the similarity score may be aweighted combination of the similarity scores for each attribute. Forexample, the weight for the title attribute may be 0.75 and the weightfor the author attribute may be 0.25, which indicates that thesimilarity between the titles is more indicative of a match than thesimilarity between the authors. Each way to calculate a similarityscore, that is, each filter, may have its own threshold similarity scorethat indicates whether the item description and item definition match.Thus, in this example, if the ISBNs are identical, then the itemdescription and item definition may be considered a match. If, however,the ISBN is missing from the item description, then the item descriptionand the item definition may be considered a match when the title andauthor are very similar.

When the matching engine identifies only one item definition with asimilarity score above the threshold similarity score, then the matchingengine indicates that the item definition matches the item description.In such case, the item described by the item description may be added tothe inventory corresponding to the matching item definition. If,however, the matching engine identifies more than one item definitionwith a similarity score above the threshold similarity score, then thematching engine identifies the item description as ambiguous. In suchcase, the item authority system may provide a manual matching subsystemthrough which a user can resolve the ambiguity. If the similarity scoreof no item definition exceeds the threshold similarity score, but atleast one item definition has a similarity score that indicates it is apotential match (i.e., above a nearly matched threshold similarityscore), then the matching engine may provide the item description andthe potential matching item definitions to the manual matching subsystemso that a user can manually identify a matching item definition, asappropriate. These potential matching item definitions are placed in a“resolution queue” waiting for manual intervention.

If the matching engine does not identify any potential matching itemdefinitions, then the matching engine determines whether the itemdescription qualifies as a new item for which a new item definitionshould be defined. If the item description qualifies, then a new itemdefinition is added to the product catalog. If the item description doesnot qualify, then the matching engine may place it in the resolutionqueue waiting for manual intervention. In such a case, a user would needto manually indicate a matching item definition or indicate that a newitem definition is to be defined. Alternatively, the matching engine maydiscard the item description if it does not match any item definitionand does not qualify for addition to the product catalog. In this way,the item authority system automatically identifies the matching itemdefinition for an item description when possible, and when not possibleit provides potential matching item definitions to assist in manualmatching or discards the item description. The collection of itemdefinitions (e.g., a product catalog) may be considered the “itemauthority” since it is the authoritative definition of the items.

The item authority system, in one embodiment, includes a manual matchingsubsystem that facilitates the matching of item descriptions to itemdefinitions that cannot be automatically matched. The matching engineprovides to the manual matching subsystem the item descriptions forwhich a match cannot be identified along with any ambiguous matching orpotential matching item definitions that have been identified. Themanual matching subsystem provides a user interface that allows a userto view the unmatched item descriptions (i.e., those in the resolutionqueue) along with the ambiguous matching and potential matching itemdefinitions. The user can request detailed information about the itemdefinitions and manually indicate which item definition, if any, matchesthe item description. When none of the ambiguous or potential matchingitem definitions match the item description, the manual matchingsubsystem may allow the user to search for other item definitions thatmay match the item description. The manual matching system may alsoallow a user to merge item definitions into a single item definition orto update existing item definitions with additional information. It ispossible that multiple item definitions may refer to the same underlyingitem. This may occur, for example, because the matching engineincorrectly identified that an item description represented a new itemdefinition. The manual matching engine may also allow the user to reviewthe automatic matches made by the matching engine and to override thematches when appropriate. Thus, the item authority system provides forthe automatic matching of item description and item definitions whenpossible and, when not possible, provides for manual matching.

FIG. 1 is a block diagram illustrating the subsystems of the itemauthority system in one embodiment. The item authority system 100includes matching engine subsystem 110, store item subsystem 120, andmanual matching subsystem 130. One skilled in the art will appreciatethat these subsystems represent one possible division of the functionsof the item authority system. The functions of the item authority systemmay be divided into subsystems in different ways or not divided at all.The item authority system interacts with the item definition table 101and the inventory table 102. The item definition table, which maycorrespond to a product catalog, contains the authoritative itemdefinition of each item. The inventory table contains the inventoryinformation for each item that is available to be purchased. Theinventory information may include the name of the merchant, the price ofthe item, shipping information, a unique identifier of the matching itemdefinition in the item definition table, and other information about theitem. The matching engine subsystem receives item descriptions andattempts to identify the matching item definition. The matching enginethen provides the item description to the store item subsystem alongwith the status of the item description and the identification of anymatching or potentially matching item definitions. In one embodiment,the status may be set to matched, ambiguous, resolution queue,unmatched, and new. The status of matched indicates that the itemdescription matches a unique item definition. The status of ambiguousindicates that the item description matches more than one itemdefinition. The status of resolution queue indicates that one or morepotential matching item definitions have been identified, but nonequalifies as a match. In one embodiment, the ambiguous and resolutionqueue statuses may be combined into a single status since thedistinction between multiple matching item definitions and one or morepotential matching item definitions may not be important to a user whois using the manual matching subsystem. The status of new means that theitem description represents a new item definition that can beautomatically added to the item definition table. The status ofunmatched means that no matching or potentially matching itemdefinitions have been found and the item description does not meet thecriteria for automatically creating a new item definition. The storeitem subsystem receives the item description, status, and matching orpotential matching item definitions from the matching engine subsystem.The store item subsystem updates the inventory table based on the itemdescriptions indicating whether a match has been found and adds newentries to the item definition table for the item descriptions with astatus of new. The store item subsystem passes a list of (or anidentification of) the unmatched item descriptions to the manualmatching subsystem. The manual matching subsystem provides a userinterface that allows a user to manually match item descriptions in theresolution queue to item definitions and to perform other tasks asdiscussed below in more detail.

FIG. 2 is a block diagram illustrating components of the matching enginesubsystem in one embodiment. The matching engine subsystem 110 includescomponents 201-208 that perform the matching functions, rules table 211,and indexes 212. In one embodiment, the components are implemented as acomputer program that executes on a computer system. The organization ofthese components illustrates one possible organization. One skilled inthe art will appreciate that the function of the matching engine can beorganized into components in many different ways. The item authoritysystem may be implemented on one or more computer systems that receiveitem descriptions via a communications link. Each computer system mayinclude a central processing unit, memory, input devices (e.g., keyboardand pointing devices), output devices (e.g., display devices), andstorage devices (e.g., disk drives). The memory and storage devices arecomputer-readable media that may contain instructions that implement theitem authority system. In addition, the data structures and messagestructures include requests and responses that may be stored ortransmitted via a data transmission medium such as a signal on thecommunications link. The matching engine subsystem, the store itemsubsystem, and the manual matching subsystem may each execute on adifferent computer system or may execute on the same computer system.One skilled in the art will appreciate that the particular arrangementof the item authority system can be customized to meet the performancegoals of the system.

The rules table 211 of the matching engine contains various rules foreach category of items. For example, the rules table may contain a setof rules for a book category of items and a separate set of rules for aconsumer electronics category of items. In one embodiment, each rulecontains one or more filters that specify a criterion for an itemdescription to match an item definition. Each filter specifies one ormore attributes and a scoring algorithm for quantitatively representingthe similarity between an item description and an item definition.

The indexes 212 allow rapid access to entries of the item definitiontable 101 that match certain values. In one embodiment, an index foreach attribute represented by a filter of a rule is generated. Forexample, if a rule for a category of books has a filter that specifiesan ISBN attribute and a title attribute, then an index for the ISBNattribute and another index for the title attribute is created. Thematching engine uses the indexes to quickly identify item definitionscorresponding to a given attribute value. In one embodiment, the indexesmay be updated on a periodic basis, such as daily, to reflect changes inthe item definition table. With only periodic updating, the indexes maynot represent the current state of the item definition table. Inparticular, item definitions may have been added to the item definitiontable after the indexes were last generated. Because the indexes may beout of date, after searching the indexes the matching engine may searchall newly defined items in the item definition table (i.e., those itemsdefined after the last index update) before adding a new item definitionto the item definition table to ensure that a duplicate item definitionfor the same item is not created.

The item matcher component 201 controls the overall processing of thematching engine. The item matcher component receives an item descriptionand outputs the item description and its status along with an indicationof any matching or potentially matching item definitions. The itemmatcher component initially uses categorizer 202 to identify thecategory associated with the item description. The category isidentified by looking at certain data fields in the item description.For example, the categorizer may examine a field representative ofproduct type (e.g., consumer electronics, toys) and a second fieldrepresentative of product sub-type (e.g., DVDs, stuffed animals). Thevalues of these fields are mapped to different user-defined categories.The item matcher component then retrieves the rules for the identifiedcategory from the rules table 211. The item matcher component theninvokes the match to item definitions component 203 to identify itemdefinitions that match the item description based on the rules for thatcategory.

The item matcher component 201 sets the status for the item descriptionbased on the identified category rules and the matching and potentialmatching item definitions. The match to item definitions componentinitially identifies candidate item definitions using the identifycandidate item definitions component 204. The candidate item definitionsrepresent the set of item definitions that are selected for detailedanalysis as to whether they match the item description. Since there maybe hundreds of thousands of item definitions, it may be impractical toperform a detailed analysis on each item definition. The candidate itemdefinitions represent an initial selection of item definitions that mostlikely include the matching item definitions, and which may be afraction of the total number of item definitions in the item definitiontable 101. The identify candidate item definitions component, in oneembodiment, uses a designated filter of each rule of the identifiedcategory to retrieve candidate item definitions based on the indexes.The match to item definitions component 203 then invokes the identifyrule matches component 205 for each rule passing the candidate itemdefinitions and receiving matching or potentially matching itemdefinitions in return. When a matching item definition is identified,then the match to item definitions component returns that matching itemdefinition without evaluating any other rule. If no matching itemdefinition is found, then the match to item definitions componentreturns any potentially matching item definitions that were identified.As further described herein, the identify rule matches component appliesa rule to the candidate item definitions to calculate similarity scoresfor the item definitions. The identify rule matches component 205invokes the identify filter matches component 206 to calculate asimilarity score for each candidate item definition for each filter. Theidentify filter matches component invokes the apply filter component 207for each filter to calculate scores. The apply filter component invokesvarious scoring methods 208 designated by the filters to calculate asimilarity score for an attribute. For example, there may be a scoringmethod for comparing attributes and generating a similarity scoreindicating the similarity. In one embodiment, the similarity scores varybetween zero and one, with zero indicating very dissimilar and oneindicating very similar.

Table 1 contains an example set of rules for the category of books inone embodiment.

TABLE 1 1. <BOOKS resolution_threshold=“.8”> 2.  <RULE name=“isbn”search_size=“100”> 3.   <FILTER> 4.    <isbn method=“wordmatch”criteria=“.99” weight=“1.00” present=“true” valid=“true”/> 5.    <SIZE>10 </SIZE> 6.    <THRESHOLD> .99 </THRESHOLD> 7.   </FILTER> 8.  <FILTER> 9.    <isbn method=“wordmatch” criteria=“.99” weight=“.75”present=“true” valid=“true”/> 10.    <author method=“edit_distance”criteria=“.60” weight=“.25” present=“true” valid=“true”/> 11.    <SIZE>4 </SIZE> 12.    <THRESHOLD> .90 </THRESHOLD> 13.   </FILTER> 14.  <FILTER> 15.    <isbn method=“wordmatch” criteria=“.99” weight=“.75”present=“true” valid=“true”/> 16.    <title method=“edit_distance”criteria=“.25” weight=“.25” present=“true” valid=“true”/> 17.    <SIZE>4 </SIZE> 18.    <THRESHOLD> .90 </THRESHOLD> 19.   </FILTER> 20. </RULE> 21.  <RULE nanne=“title” search_size=“100”> 22.   <FILTER> 23.   <title method=“edit_distance” criteria=“.25” weight=“.5”present=“true” valid=“true”/> 24.    <product_type_idmethod=“edit_distance” criteria=“.99” weight=“.50” present=“true” 25.valid=“true”/> 26.    <SIZE> 100 </SIZE> 27.    <THRESHOLD> .60</THRESHOLD> 28.   </FILTER> 29.   <FILTER> 30.    <titlemethod=“edit_distance” criteria=“.25” weight=“.30” present=“true”valid=“true”/> 31.    <author method=“edit_distance” criteria=“.25”weight=“.25” present=“true” valid=“true”/> 32.    <brandmethod=“edit_distance” criteria=“.5” weight=“.25” present=“true”valid=“true”/> 33.    <book_format method=“edit_distance” criteria=“.5”weight=“.20” present=“true” 34. valid=“true”/> 35.    <SIZE> 50 </SIZE>36.    <THRESHOLD> .80 </THRESHOLD> 37.   </FILTER> 38.   <FILTER> 39.   <title method=“edit_distance” criteria=“.25” weight=“.50”present=“true” valid=“true”/> 40.    <author method=“edit_distance”criteria=“.25” weight=“.25” present=“true” valid=“true”/> 41.    <brandmethod=“edit_distance” criteria=“.5” weight=“.25” present=“true”valid=“true”/> 42.    <SIZE> 50 </SIZE> 43.    <THRESHOLD> .90</THRESHOLD> 44.   </FILTER> 45.  </RULE> 46.  <NEW> 47.   <isbnpresent=“true” valid=“true”/> 48.   <title present=“true” valid=“true”/>49.   <brand present=“true” valid=“true”/> 50.   <UNIQUE> 51.   <ATTRIBUTE>isbn</ATTRIBUTE> 52.   </UNIQUE> 53.  </NEW> 54. </BOOKS>

The rules are specified using the extensible markup language (“XML”). Inthis example, the books category has a set of rules delimited by thebooks tags (lines 1 and 54). The category includes two rules (lines 2-20and lines 21-45) along with a new item definition criterion (lines46-53). The “books” tag (line 1) includes a “resolution_threshold” fieldthat indicates the resolution queue threshold is 0.8 (i.e., apotentially matching threshold similarity score). That is, only thosecandidate item definitions whose similarity scores are above 0.8 areconsidered as potential matches. The first rule delineated by the “rule”tags (lines 2-20) contains three filters delineated by the “filter” tags(lines 3-7, 8-13, 14-19). The “rule” tag includes a “name” field and a“search_size” field (line 2). The name of the first rule is “ISBN,” andthe search size is 100. The search size for a rule indicates the numberof the candidate item definitions that are identified from the itemdefinition table. Each filter identifies one or more attributes that areto be used in calculating the similarity score. For example, the secondfilter (lines 8-13) indicates an attribute scoring technique for theISBN attribute (line 9) and the author attribute (line 10) that is to beused in calculating the similarity score associated with that filter.The “criteria” field indicates that the resulting attribute similarityscore has to be above 0.99, or else the filter similarity score is setto zero. The “weight” field indicates the weight of the attributesimilarity score given to this attribute when calculating the filtersimilarity score. In this example, the ISBN attribute has a weight of0.75, and the author attribute has a weight of 0.25. Thus, the filtersimilarity score is calculated by adding 0.75 of the ISBN similarityscore and 0.25 of the author similarity score. The “present” fieldindicates that this attribute needs to be present in the item definitionto calculate a similarity score. The “valid” field indicates that thisattribute value needs to be valid in the item definition to calculate asimilarity score. The “present” and “valid” fields are also used to testthe integrity of the data of an item description. If an item descriptiondoes not have an attribute that should be present or if its attributevalue is invalid when it should be valid, then the item authority systemidentifies no item definitions as matching or potentially matching forthat filter. The “threshold” tag of a filter indicates the minimumfilter similarity score to be considered a match. If the filtersimilarity score is above the threshold similarity score, then the itemdefinition is designated a match. An attribute similarity score iscalculated using the scoring method specified by the attribute tags. Forexample, the ISBN attribute has a “method” field that identifies thatthe “wordmatch” method is to be used when generating the attributesimilarity score. The wordmatch method removes punctuation, createstokens, and compares the collection of tokens in the item descriptionand item definition to identify the intersection of the two token sets.Those skilled in the art will recognize that many other techniques existfor comparing two text strings, including, for example, comparing thestrings for an exact match (“exactmatch” method), measuring how manychanges are necessary to modify one string to equal the second string(“editdistance” method), and determining the number, order, and identityof words that are detected (“order sequence” method). If the filtersimilarity score is above the resolution threshold similarity score,then the item definition is designated as potentially matching, assumingit has not already been designated as a match. The “size” tag of afilter indicates the maximum number of item definitions to be identifiedas having filter similarity scores above the threshold similarity score.The size tags and the search size fields thus place a limit on thenumber of item definitions that are checked by the matching engine. If amatch is found for an item description, then any potentially matchingitem definitions can be disregarded. It is also possible that a singleitem definition may be indicated as a match or a potential match basedon different filters. In such a case, the matching engine may set thesimilarity score for the item definition to the highest similarity scoreof a match, or, when a match is not found, to the highest similarityscore of a potential match.

The new tag of the books rule indicates the criteria that an itemdescription is to satisfy for it to be considered as representing a newitem definition. In the example of Table 1, the ISBN, title, and brandtags should each be present in the item description and their attributevalues should be valid. In addition, the value of the ISBN attribute ofthe item description should not be a duplicate of an ISBN that isalready in the item definition table. If no matching or potentialmatching item definitions are found for an item description and the itemdescription passes the new criteria, then the item authority systemautomatically adds a new item definition to the item definition table.Alternatively, the new item definition may be added only afterconfirmation by a system user.

FIG. 3 is a block diagram illustrating the components of the manualmatching subsystem in one embodiment. The manual matching subsystem 130includes components 301-304 that control the manual matching function,unmatched item description table 311, possibly matching item definitiontable 312, and change history table 313. The unmatched item descriptiontable 311, which corresponds to those items in the resolution queue,contains an entry for each item description that was not matched by thematching engine subsystem. The possibly matching item definition table312 links each unmatched item description to each ambiguous andpotential matching item definition identified by the matching enginesubsystem 110. The change history table 313 contains an entry recordingeach change in the inventory table 102 or item definition table 101. Themanual matching subsystem can be used to review and override the changesas appropriate. It will be appreciated that the definitive itemdefinition can therefore change over time, as new information is addedto the item definition or previously incorrect information in the itemdefinition is corrected.

The manual matching component 301 provides a user interface throughwhich a user can select the function of the item resolution component302, the inventory match component 303, and the review history component304. The item resolution component 302 allows a user to view theunmatched item descriptions whose status is resolution queue and matchthose item descriptions with one of the item definitions in thepotentially matching item definition table. The inventory matchcomponent 303 allows a user to review item definitions and to matchunmatched item descriptions of the inventory table (i.e., those itemdescriptions that have no potential matches) with item definitions. Thereview history component 304 allows a user to view and override changesthat have been made to the inventory table and item definition table.

FIGS. 4-9 illustrate the flow diagrams of components of the matchingengine subsystem in one embodiment. FIG. 4 is a flow diagramillustrating the processing of the item matcher component 201 in oneembodiment. The item matcher component is passed an item description andreturns a status for that item description along with an indication ofany matching or potential matching item definitions and their similarityscores. In block 401, the component identifies the category of the itemrepresented by the item description using the categorizer. In block 402,the component retrieves the rules for the identified category from therules table. In block 403, the component invokes the match to itemdefinitions component 203, passing the item description and theretrieved rules and receiving the identification of any matching andpotentially matching item definitions in return. In blocks 404-407, thecomponent determines the status for the item description. In decisionblock 404, if at least one item definition is designated as matching,then the component continues at block 405, else the component continuesat block 406. In decision block 405, if only one item definition isdesignated as matching, then the item description matches only one itemdefinition and the component returns a status of matched. If, however,more than one item definition is designated as matching, then thecomponent returns the status of ambiguous. In decision block 406, ifthere is at least one item definition that is designated as potentiallymatching, then the component returns a status of resolution queue, elsethe component continues at block 407. In decision block 407, if the itemdescription passed the “new” criteria, then the component returns astatus of new, else the component returns a status of unmatched.

FIG. 5 is a flow diagram illustrating the processing of the match toitem definitions component 203 in one embodiment. The component ispassed an indication of an item description and a set of rules andreturns an indication of the matching or potentially matching itemdefinitions. The component loops, selecting each of the passed rules,identifying candidate item definitions for the rule, and identifyingmatching or potentially matching item definitions in accordance with therule. When evaluating a rule, if the component identifies one or moreitem definitions that actually match, the component returns those itemdefinitions without evaluating any additional rules. If, however, thecomponent identifies no actual matches, then the component evaluates allthe rules and returns any potentially matching item definitions thatwere identified. In block 501, the component selects the next rule. Indecision block 502, if all the rules have already been selected withoutidentifying an actual matching item definition, then the componentreturns any potentially matching item definitions, else the componentcontinues at block 503. In block 503, the component invokes the identifycandidate item definitions component and receives a list of thecandidate item definitions for the selected rule in return. In block504, the component invokes the identify rule matches component toidentify those candidate item definitions that match or potentiallymatch the selected rule. In decision block 505, if one or more matchingitem definitions are identified, then the component returns thosematching item definitions, else the component continues at block 506. Inblock 506, the component adds any item definitions that potentiallymatch the rule to the collection of item definitions that potentiallymatch the item description. The component then loops to block 501 toselect the next rule.

FIG. 6 is a flow diagram illustrating the processing of the identifycandidate item definitions component 204 in one embodiment. In thisillustrated embodiment, the component identifies item definitions basedon the attributes identified in the first filter of the rule. That is,the component identifies item definitions whose attribute values exactlymatch the attribute values in the item description for any or all of theattributes specified in the first filter of the rule. For example, thesecond rule of Table 1 (lines 21-45) has the attributes of title andproduct_type_id in its first filter. Thus, this component will retrieveall item definitions (up to the search size) that exactly match one orboth of those attributes. One skilled in the art will appreciate thatmany different techniques may be used to identify candidate itemdescriptions. For example, candidate item descriptions may be identifiedbased on closeness of their attribute values rather than on exactmatches, or candidate item descriptions may be identified only if allattribute values are present. In blocks 601-603, the component loops,creating a query of the attribute values from the item description. Itwill be appreciated that any information retrieval system allowingrecovery on fielded data is suitable for query execution. For example,the query may be a SQL query with a “where” clause for each attribute ofthe first filter of the rule. The query is then executed against theindexes. The query for the second rule in Table 1 might be

SELECT item_definition_key   WHERE title = “Harry Potter”   WHEREproduct_type_id = “paperback”

In block 601, the component selects the next attribute of the firstfilter of the rule. In decision block 602, if all the attributes of thefirst filter have already been selected, then the component continues atblock 604, else the component continues at block 603. In block 603, thecomponent adds a “where” statement to the query indicating the attributeand the attribute value of the item description. The component thenloops to block 601 to select the next attribute. In block 604, thecomponent executes the query limited to the search size number ofresults specified by the rule. In one embodiment, the query is executedagainst the indexes generated by the matching engine. The component thenreturns the query results as the candidate item definitions.

FIG. 7 is a flow diagram illustrating the processing of the identifyrule matches component 205 in one embodiment. This component is passedthe item description, a rule, and candidate item definitions and returnsan indication of those item definitions that match or potentially matchthe item description. In block 701, the component selects the nextfilter of the rule. In one embodiment, the component may skip the firstfilter of the rule when the first filter is used to identify thecandidate item definitions. In decision block 702, if all the filtershave already been selected, then the component returns an indication ofthe matching and potentially matching item definitions, else thecomponent continues at block 703. In block 703, the component invokesthe identify filter matches component for the selected filter andreceives an indication of the item definitions that match or potentiallymatch based on the filter. In block 704, the component adds the returneditem definitions that match or potentially match the filter to thecollection of item definitions that match or potentially match for therule and then loops to block 701 to select the next filter.

FIG. 8 is a flow diagram illustrating the processing of the identifyfilter matches component 206 in one embodiment. The component is passedan item description, a filter, and an indication of the candidate itemdefinitions and returns a list of those candidate item definitions thatmatch or potentially match and their similarity scores. The componentalso tests the integrity of the item description, and, if it does notpass the test, the component discards any matching or potentiallymatching item definitions. In blocks 801-805, the component loops,selecting each candidate item definition and applying the filter to it.In block 801, the component selects the next item definition from thecandidate item definitions. In decision block 802, if all the candidateitem definitions have already been selected, then the componentcontinues at block 806, else the component continues at block 803. Inblock 803, the component invokes the apply filter component to calculatethe similarity score for the selected item definition. In decision block804, if the similarity score is above the threshold similarity score forthat filter or above the resolution queue threshold similarity score forthe category, then the component continues at block 805, else thecomponent loops to select the next candidate item definition. In block805, the component adds the selected item definition to the list ofmatching and potentially matching item definitions as appropriate andthen loops to block 801 to select the next candidate item definition. Inblock 806, the component tests the integrity of the data of the itemdescription as it relates to applying the passed filter. In oneembodiment, the integrity test is defined by the present and validfields of the attributes specified by the passed filter. If the presentfield is true for an attribute but the item description does not includethat attribute or if the valid field is true for an attribute but thevalue for that attribute in the item description is not valid, then theitem description fails the integrity test. One skilled in the art willappreciate that other integrity testing can be performed and that thetesting could be performed before calculating the similarity scores. Indecision block 807, if the item description passes the integrity test,the component continues at block 808, else the component returns anindication that none of the candidate item definitions match the itemdescription. In block 808, the component sorts the list of matching orpotentially matching item definitions based on similarity score. Inblock 809, the component selects matching and potentially matching itemdefinitions with the highest similarity scores up to the size number(indicated by the size tag in the filter) and returns the selected itemdefinitions.

FIG. 9 is a flow diagram illustrating the processing of the apply filtercomponent 207 in one embodiment. In blocks 901-909, the component loops,selecting each attribute of the filter and calculating an attributesimilarity score for that attribute and calculating a running filtersimilarity score for the filter. In block 901, the component selects thenext attribute of the filter. In decision block 902, if all theattributes of the filter have already been selected, then the componentreturns the filter similarity score, else the component continues atblock 903. In decision block 903, if the “present” field of the selectedattribute is true, then the component continues at block 904, else thecomponent continues at block 905. In decision block 904, if theattribute is present in the item definition, then the componentcontinues at block 905, else the component returns a filter similarityscore of zero. In decision block 905, if the “valid” field of theselected attribute is true, then the component continues at block 906,else the component continues at block 907. In decision block 906, if theattribute value is valid in the item definition, then the componentcontinues at block 907, else the component returns a filter similarityscore of zero. In block 907, the component applies the scoring methoddesignated by the selected attribute to generate an attribute similarityscore. In decision block 908, if the attribute similarity score is abovethe criteria value for that attribute, then the component continues atblock 909, else the component returns a filter similarity score of zero.In block 909, the component adds the weighted attribute similarity scoreto the running filter similarity score and then loops to block 901 toselect the next attribute.

FIGS. 10-17 are display pages illustrating the user interface of themanual matching subsystem in one embodiment. The manual matchingsubsystem includes an item resolution component, a match component, anda review component. The item resolution component allows a user to viewitem descriptions placed in the resolution queue by the matching engine.The item resolution component allows the user to match the unmatcheditem descriptions to one of the matching or potentially matching itemdefinitions identified by the matching engine or to reject any of theidentified item definitions as matches. The match component allows theuser to search for and merge item definitions and to locate unmatcheditem descriptions that may match an item definition. The match componentalso allows the user to locate an unmatched item description, to searchfor potential matching item definitions, and to match an unmatched itemdescription to an item definition. The manual matching subsystem may usethe same rules as the matching engine and share components of thematching engine for calculating similarity scores. The manual matchingsubsystem may also allow a user to set a resolution similarity score forthe resolution queue. The manual matching subsystem effectively discardsany item definitions whose similarity score is not above the resolutionsimilarity score. For example, if the resolution similarity score is0.8, and the resolution queue contains an item description with threepotentially matching item definitions with similarity scores of 0.5,0.75, and 0.85, then the subsystem will discard the two item definitionswith scores of 0.5 and 0.75 and report only the item definition with thescore of 0.85. In this way, the user can effectively filter out itemdefinitions that, based on the user's experience, have similarity scoresthat are too low to indicate a match.

FIG. 10 is a display page illustrating the processing of unmatched itemdescriptions of the resolution queue in one embodiment. Display page1000 includes attribute values 1001 for an unmatched item description ofthe resolution queue along with the attribute values 1002 of thematching and potentially matching item definitions identified by thematching engine. Each item definition includes a checkbox 1003 forselecting that item definition. The item definitions may also includethe similarity scores 1004 calculated by the matching engine. A user mayselect the “compare” link 1006 to view a more detailed comparison of theunmatched item description and the selected item definitions. The usermay select the “match” link 1007 to match a selected item definition tothe unmatched item description. The user may select the “reject” link1008 to indicate that none of the item definitions match and to removethe unmatched item description from the resolution queue. The itemresolution component may allow the user to browse through and select theunmatched item descriptions in the resolution queue.

FIG. 11 is a display page illustrating the process of searching for anitem definition using the match component. Display page 1100 includes asearch specification area 1101. The user enters the search specificationin the search specification area and then selects the “search” link 1102to search for item definitions that match the search specification. Thissearch may be performed against the indexes generated by the matchingengine or the item definition table itself, and may be use any of thescoring methods used to match attributes.

FIG. 12 is a display page illustrating item definition search results inone embodiment. Display page 1200 includes search specification area1201 for displaying the original search and attribute values 1202 foreach item definition that matches the search specification. Each itemdefinition also includes a checkbox 1203. A user can view a moredetailed comparison of the item definitions and possibly merge itemdefinitions by selecting checkboxes and then selecting the“compare/merge” link 1204.

FIG. 13 is a display page illustrating the comparison of itemdefinitions prior to merging. Display page 1300 includes attributevalues 1301 for the item definitions that have been selected forcomparison. Each item definition includes radio buttons 1302 fordesignating whether the item definition is to be the primary orsurviving item definition of the merge, is to be merged with the primaryitem definition, or is not to be merged into the primary itemdefinition. After selecting the radio buttons, the user selects the“merge” link 1303 to perform the merge as indicated. In one embodiment,when item definitions are merged, the item descriptions that werematched to the merged item definitions are set to match the primary itemdefinition and the merged item definitions are removed from the itemdefinition table, leaving only the primary item definition. In anotherembodiment, a user may select which attributes from the merged itemdefinitions are to be added to the primary item definition and whichattributes are to be discarded from the merged item definitions.

FIG. 14 is a display page illustrating matching of unmatched itemdescriptions to an item definition in one embodiment. Display page 1400may be displayed after the user has selected an item definition andrequested to view all unmatched item descriptions that may match thisitem definition. The display page includes attribute values 1401 of theitem definition and attribute values 1402 of the unmatched itemdescriptions ordered according to their similarity score. The similarityscores may be calculated using the rules as described above. A user mayview item descriptions that match the item definition by selecting the“view matching” link 1403. A user may find viewing the matching itemdescription helpful in deciding whether an unmatched item descriptionshould be matched. The user can view a more detailed comparison of theitem definition to the unmatched item descriptions by selectingunmatched item descriptions of interest and selecting the “compare” link1404. The user may also match unmatched item descriptions to the itemdefinition by selecting the unmatched item descriptions and selectingthe “match” link 1405.

FIG. 15 is a display page illustrating entry of a search specificationfor unmatched item descriptions in one embodiment. Display page 1500includes search specification area 1501 for entry of the searchspecification. After entry, the user selects the “search” link 1502 toperform the search.

FIG. 16 is a display page illustrating the results of a search forunmatched item descriptions in one embodiment. Display page 1600includes the search specification area 1601 for displaying the originalsearch and attribute values 1602 for the unmatched item descriptionsthat satisfy the search specification. The user can select thecheckboxes of the item descriptions of interest and then the “select”link 1601 to view additional information about the item descriptionssuch as possible matching item definitions.

FIG. 17 is a display page illustrating possible match item definitionsfor an unmatched item description. Display page 1700 includes itemdescription area 1701 and attribute values 1702 for possible matchingitem definitions. The possible matching item definition may beidentified using the rules as described above. A user can select the“compare” link 1703 to compare the selected item definitions to theunmatched item description. The comparison may include the similarityscores for the item definitions. A user may select the “match” link 1704to match a selected item definition to the item description.

From the foregoing, it will be appreciated that although embodiments ofthe item authority system have been described for purposes ofillustration, various modifications may be made without deviating fromthe spirit and scope of the invention. For example, the manual matchingsystem may be modified to allow the user to specify new item definitionsthat are based on a received and unmatched item description.Accordingly, the invention is not limited except by the appended claims.

1. A method in a computer system for matching item descriptions to itemdefinitions, the method comprising: receiving an indication of itemdefinitions that possibly match an item description; displaying anindication of attributes of the item description and the possiblymatching item definitions; receiving from a user a selection of one ofthe possibly matching item definitions; and designating that theselected item definition matches the item description.
 2. The method ofclaim 1 wherein the displaying includes displaying a similarity scorefor each item definition indicating its similarity to the itemdescription.
 3. The method of claim 1 wherein the possibly matching itemdefinitions include ambiguously matching item definitions.
 4. The methodof claim 1 wherein the possibly matching item definitions includepotentially matching item definitions.
 5. The method of claim 1 whereinthe item definitions are identified as possibly matching the itemdescription based on rules that specify how to generate a similarityscore based on similarity between the item definition and an itemdescription.
 6. The method of claim 5 wherein an item definition and anitem description have attributes with values and wherein a ruleindicates how to generate a similarity score based on similarity betweenthe values of the attributes of the item definition and the itemdescription.
 7. The method of claim 5 wherein a rule includes one ormore filters, each filter specifying how to generate a filter similarityscore, wherein the similarity score for an item definition is the filtersimilarity score that indicates the item definition is most similar tothe item description.
 8. The method of claim 7 wherein an itemdefinition and an item description have attributes with values andwherein a filter specifies an attribute scoring technique for one ormore attributes that generates an attribute similarity score for thatattribute and specifies how to combine the generated attributesimilarity scores to generate a filter similarity score.
 9. The methodof claim 7 wherein a filter indicates a threshold filter similarityscore wherein, when the filter similarity score generated in accordancewith the filter meets the threshold filter similarity score, the itemdefinition matches the item description.
 10. The method of claim 5wherein a rule includes one or more filters, each filter specifying howto identify item definitions that possibly match the item description.11. The method of claim 1 including providing a potentially matchingthreshold similarity score for an item definition to be considered aspotentially matching the item description.
 12. The method of claim 1wherein when more than one item definition has a similarity score thatindicates it is a match, identifying those item definitions asambiguously matching item definitions.
 13. The method of claim 1 whereinthe item definitions that possibly match an item description havesimilarity scores and discarding item definitions whose similarity scoredoes not pass a resolution similarity score.
 14. A computer system formanually matching item definitions to item descriptions, comprising: anitem resolution component that allows a user to manually match possiblymatching item definitions to item descriptions, the item definitionsbeing automatically identified as possibly matching; and a matchcomponent that allows unmatched item descriptions representing items ininventory to be manually matched to item definitions.
 15. The computersystem of claim 14 including a review component that allows theautomatic matching of item definitions to item descriptions to bereviewed by a user.
 16. The computer system of claim 14 wherein thematch component allows new item definitions to be manually defined basedon unmatched item descriptions.
 17. The computer system of claim 14wherein the match component allows duplicate item definitions to bemerged into a single item definition.
 18. The computer system of claim14 including a data structure that includes the identifications ofpossibly matching item definitions for item descriptions and similarityscores for the item definitions.
 19. The computer system of claim 14wherein the match component calculates a similarity score representingthe similarity between an item definition and an item description. 20.The computer system of claim 19 wherein the similarity scores arecalculated based on a rule that specifies one or more scoring metrics.21.-42. (canceled)